Goto

Collaborating Authors

 accelerating rescaled gradient descent


Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Neural Information Processing Systems

We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for "accelerating" descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.


Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Neural Information Processing Systems

We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for "accelerating" descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.


Reviews: Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Neural Information Processing Systems

I think the first part of the paper has very good original contributions with correct and nicely-written proofs in the appendix. However, I have the following questions regarding the parts of the paper starting at Section 3. Sorry if these are redundant questions with obvious answers that I missed. The RGD framework is mentioned for both convex and non-convex functions (Lemma 4 doesn't require f to be convex). However, the examples provided are all convex functions, and the focus also seems to be quite heavily on convex functions (because none of the papers on nonconvex optimization are compared with). Do the authors have (1) theoretical results and comparisons with existing work and/or (2)experiments, for non-convex functions?


Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Neural Information Processing Systems

We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for "accelerating" descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.


Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Wilson, Ashia C., Mackey, Lester, Wibisono, Andre

Neural Information Processing Systems

We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for "accelerating" descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.